DOCUMENT RESUME 



ED 329 593 



TM 016 256 



AUTHOR 
TITLE 



PUB DATE 
NOTE 



PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Kromrey, Jeffrey D*; Hines/ Constance V* 
Randomly Missing Data in Multiple Regression: An 
Empirical Comparison of common Missing Data 
Treatments • 
Feb 91 

67p.; Paper presented at the Annual Meeting of the 
Eastern Educational Research Association (Boston/ MA, 
February 13-16, 1991). 

Reports - Evaluative/Feasibility (142) — 
Speeches/Conference Papers (150) 

MF01/PC03 Plus Postage. 

Comparative Analysis; Computer Simulation; 
^Estimation (Mathematics); Mathematical Models; 
^Multiple Regression Analysis; *Sample Size; 
^Statistical Bias 

Bootstrap Mv :hods; *R2 Values; ^Randomly Missing Data 
(Regression Analyses) 



ABSTRACT 
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and both regression weights. Two deletion procedures (listwise and 
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An Empirical Comparison of Common Missing Data Treatments 



Abstract 



This research is an investigation of the effects of randomly 
missing data in two-predictor regression analyses and the 
differences in the effectiveness of five common treatments of 
missing data on estimates of r2 and each of the two standardized 
regression weights. Bootstrap samples of size 50, 100, and 200 
were drawn from three sets of actual field data. Randomly 
missing data were created within each sample and the parameter 
estimates were compared with those obtained from tne same 
samples with no missing data. The results indicated that three 
imputation procedures (mean substitution, s ^^nple and multiple 
regression imputation) produced biased estimates of r2 and both 
regression weights. Two deletion procedures (listwise and 
pairwise) provided accurate parameter estimates with up to 60% 
of the data missing- 



Empirical research in any field is frequently hampered by 
missing data, but in no field is the problem more pervasive than 
in the social sciences • Research subjects may fail to respond 
to every item on a survey, students may be absent from classes 
during testing, questionnaires may be lost or inadvertently 
discarded by either the respondent or the researcher. To this 
list, one must add the considerations of equipment failures, 
illegible handwriting, and miscoded data fields. 

Each time a set of data with missing fields is encountered, 
some type of missing data treatment is mandated. Summary 
statements in research reports such as "the missing data were 
ignored" or "only complete cases were used in the analysis", 
suggest that the explicit treatment of data absences is optional 
in statistical analyses • Such statements are misleading, 
however, because they describe (although implicitly) two types 
of missing data treatments, namely the pairwise and listwise 
deletion procedures • The researcher may be unaware that any 
missing data treatment has taken place, and consequently may be 
unconcerned about the effects of such treatment on any 
subsequent analyses, interpretations, and conclusions. 

Although discourse on methods for dealing with missing data 
is not uncommon in the social science literature, and although 
packaged computer sof ware for applied data analysis typically 
is programmed to treat missing data without explicit user 
directions, little is presented either in research literature or 
in softxj^re users' manuals to guide the applied researcher in 
grappling with the problem in any practical manner. The issues 
surrounding the problem of missing data and its treatment are 
presented in surprisingly vague and imprecise terms. Typically, 
the researcher is advised that if data are randomly missing and 
if the amount of missing data is not excessive, then any 
treatment is as good as any other. 

The purpose of this research is to provide practicing 
researchers with practical advice on three missing data issues: 
(a) the extent to which data may be missing before statistics 
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are seriously affected, (b) the best treatment to apply to 
missing data matrices, and (c) the effects of missing data and 
their treatment on the statistics interpreted in applied 
research. 

Classifications of Missing Data 

Little and Rubin (1987) distinguished two conceptually different 
types of missing data on a global level. The first type 
constitutes situations in which the underlying value of a 
variable would have been observed had the data collection been 
improved in some way. Nonresponse to surveys, equipment 
failures and lost records are examples of this type of missing 
data. This is contrasted with the second type, situations in 
which a missing data point represents unique information which 
is different from any observed values of the variable. For 
example, a respondent who is unable to indicate a preference 
between products or political candidates represents a new 
response category (i.e., No Preference), not an underlying 
preference which was masked by the occurrence of nonresponse. 

In another broad categorization of missingness, Anderson, 
Basilevsky, and Hum (1983) distinguished situations in which 
data are missing by design, and those in which data are 
inadvertently missing. Of the former type, experimental designs 
such as the Latin-square design are examples. In such designs, 
combinations of independent variables are purposely omitted 
under the explicit assumption that interaction effects are 
negligible. An additional example of data missing by design, 
encountered in survey research, is partial matrix sampling 
(Shoemaker, 1973). The distinguishing feature of data missing 
by design is that the occurrence of missing data is under the 
control of the researcher. The occurrence of inadvertently 
missing data, in contrast, is not under the direct control of 
the researcher. Malfunctioning equipment, non-readable survey 
responses, and student absenses on the day of data collection 



are examples of data that are missing inadvertently. 

An additional categorization of types of missing data 
distinguishes missing fields from missing entire records. The 
latter is the problem of nonresponse in sample surveys, while 
the former is the problem of a single variable (or several 
variables) being unobserved in an otherwise complete case. The 
treatment of missing records is typically different from the 
treatment of missing fields within records. 

Finally, the intent of the analysis of data has been used 
to distinguish among types (Frane, 1976) . Analyses intended to 
provide estimates of parameters and tests of hypotheses 
regarding the magnitudes of parameters may be subject to 
different missing data treatments than analyses intended to 
estimate derived scores for individual subjects, such as factor 
scores. 

The nature of randomness in missing data has received much 
attention in the consideration of types of missing data. Little 
and Rubin (1987) distinguish between the assumptions of data 
being "Missing at Random" and data being "Observed at Random" 
(see also Rubin, 1976). The first type consists of situations 
in which the observed units are a random subsample of the 
sampled units. The observance of a variable (or conversely, the 
occurrence of a missing data point for a variable) does not 
depend on the value of the variable itself. Data are "Observed 
at Random" if the observed units are a random subsample only 
within classes of some other variable. Thus, missing data on 
is correlated with some other variable, X2 . Given knowledge of 
X2, however, the missingness on X^ can be made conditionally 
independent. 

An Overview of Treatments of Missing Data 

Although the specific methods of treating missing data that 
have been detailed in the literature are numerous, two 
fundamental approaches are evident. In the first general 
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approach, missing data are not included in the statistical 
calculations. Entire data records evidencing missing data may be 
deleted (the listwise deletion approach) or observations are 
deleted only if the missing data occur on variables needed for a 
particular calculation (the pairwise deletion approach) . In the 
second general approach to missing data treatment, an estimate 
of each missing datum is calculated and the estimated value is 
used in statistical computations. The estimated value may be the 
mean of the variable for the total set of data (the mean 
substitution approach) , the mean of a subgroup of the data 
(subgroup mean substitution) , the value of the variable occuring 
on a similar data record (the hot-deck approach) , or a predicted 
value based upon the relationships among variables in the data. 
This latter prediction of missing data may be based upon the 
regression of the variable with missing data on the single 
variable most highly correlated with it (the simple regression 
estimation approach) or the regression may be computed on all 
variables (the multiple regression approach) . Frane (1976) 
provided a lucid criticpae of the many forms of regression 
approaches to missing data treatment. 

In contrast to the deletion techniques and the imputation 
techniques, the maximum likelihood approach to missing data 
treatment uses the characteristics of an assumed population 
distribution to provide estimates of the values of parameters 
(typically, a vector of population means and matrix of 
population variances and covariances) . The values are selected 
which maximize the likelihood of the observed data, given the 
population distribution. 

Most published considerations of the maximum likelihood 
procedures for treating missing data are found in the technical 
statistics journals (Kariya, Krishnaiah, & Rao, 1983 7 Dempster, 
Laird, & Rubin, 1977) and are not encountered in the journals of 
applied research. The technical treatments of the procedures, 
although appropriate for the target journals, may reduce their 
appeal to practitioners. The practical utility of maximum 



likelihood estimation procedures may be further reduced because 
the procedures are not available as options in packaged 
statistical analysis programs (a notable exception being the 
BMDP package, Dixon, 1983). 

CQpiparisons of Missing Data Treatments 

with the breadth of missing data treatments available, 
ranging from default methods on statistical packages to those 
requiring iterative estimations, applied researchers are likely 
to be confused about which methods work best with particular 
data structures and with particular levels of missing data. The 
extant literature on the comparison of missing data treatments 
may leave the researcher with few clear guidelines. 

Research on missing data treatments that will be useful to 
researchers facing a missing data problem should address three 
practical concerns evident in applied data analysis: 

1. The impact of missing data and the effectiveness of 
their treatment must be examined in terms of the 
effects on the parameter estimates that are to be 
interpreted. 

2. The effects should be examined in the context of the 
sampling variation of the parameter estimates. 

3. The data matrices investigated should reflect 
realistic data encountered in actual field research. 

Compari sons Using Computer Generated Data 

Haitovsky (1968) compared listwise and pairwise treatments 
with eight sets of data generated from either multivariate 
normal or uniform distributions. Haitovsky examined the bias 
and variance of each regression weight as criteria of the 
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effectiveness of the missing data treatments. The listwise 
approach was found to be superior to the pairwise method in both 
bias and efficiency, although no test of the significance of 
differences was conducted. 

Timm (1970) compared the use of four missing data 
techniques in the computation of correlation matrices and 
variance-covariance matrices. Using samples generated from 
multivariate normal distributions, Timm randomly deleted 1%, 
10%, or 20% of the data. The samples were generated in accord 
with correlation matrices obtained from field research, to 
represent patterns of high, moderate, and low levels of variable 
intercorrclations. The number of variables comprising the 
matrices was controlled at two, five, or 10 variables. The 
missing data treatments were evaluated on the basis of the 
difference between the known population matrix and the matrix 
co^nputed from the treated data. No uniformly best technique was 
observed in the study, although the regression estimation 
technique showed the highest average congruence with the 
population matrices. The design included only three samples 
from each combination of sample size, number of variables, 
proportion missing, and extent of variable intercorrelation. 
Additionally, Timm presented the results as relative 
efficiencies (ratios of the effectiveness of one treatment to 
the effectiveness of another) . Such a presentation allows 
comparisons between pairs of methods, but mitigates any attempt 
at discerning the degree to which any of the techniques 
reproduced the original population matrices. 

Gleason and Staelin (1975) compared the effectiveness of 
five missing data treatments in reproducing known population 
correlation matrices, using the same measure of differences 
between matrices as Timm (1970). The researchers manipulated 
sample size, number of variables, average magnitude of 
intercorrelation between variables and the proportion of missing 
data. The study provided no replications within cells, i.e., 
only one sample was drawn from each combination of sample size. 
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number of variables, average intercorrelation, and proportion 
missing. Additionally, some missing data treatments were 
applied in situations that normally would be inappropriate in 
actual data analysis. The number of variables (three levels: 
10, 15, and 3 0) was large relative to the number of observations 
(three levels: 50, 100, and 200) . The construction of a 
regression equation to predict a missing value using 29 
predictors when th^. number of cases prior to missing data 
deletions is only 50 must be viewed as a questionable practice 
at best (Frane, 1976) . 

Beale and Little (1975) compared six missing data 
treatments in treating samples from computer-generated 
multivariate distributions, produced according to seven patterns 
of correlation. Samples of sizes 50, 100, and 200 were selected 
and random deletions of 5%, 10%, 2 0%, or 4 0% of the 
observations on each variable were produced. The criterion of 
the effectiveness of missing data treatments was the percent 
increase in the residual sums of squares (over the complete data 
case) , when the complete data were fitted to the obtained 
regression equation. 

Beale and Little's results support the use of an iterated 
regression estimation, especially with 40% of the data missing. 
In this situation, only the results for samples of 200 were 
reported. The iterated regression approach resulted in 
increases of SSj^esid ranging from 1.9% to 24.4%, while the 
multiple regression estimation approach resulted in increases 
ranging from 3.3% to 33.4%. Both .approaches performed least 
well in two four-predictor situations in which the population 
value of r2 was greater than 0.98. Both procedures performed 
best on a two predictor model with r2 = 0.95. The methods 
diverged in models of moderate values of r2 (values from 0.44 to 
0.72), with the iterative approach showing mar>:ed improvement 
over the multiple regression approach. 

Donner and Rosner (1982) compared listwise deletion, 
pairwise deletion, regression estimation, and maximum likelihood 
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estimation in the simple case of two predictors, one of which 
has missing values. Data were randomly generated to conform to 
several correlation patterns and all were drawn from the 
multivariate normal distribution. Following sample generation, 
the values of one predictor were randomly deleted, yielding 
three levels of missing data (10%, 25%, and 50%). All 
comparisons were made relative to the maximum likelihood 
estimator. The absolute value of the deviation of the maximum 
likelihood estimator from the known regression coefficient was 
compared to the deviation of the other estimators from the 
known. The results were reported as the proportion of such 
comparisons in which the ML procedure deviated more (the 
magnitudes of the deviations were ignored) . These comparisons 
were conducted for estimates of the regression coefficients for 
the variable with missing data and the coefficients for the 
variable without missing data. 

Only partial results were reported by the authors. For the 
variable without missing data, 72 comparisons were reported 
(three comparisons between estimators, eight patterns of 
correlation, and three degrees of missing data) . Although most 
of the proportions reported favor the maximum likelihood method 
(most were less than 0.5), and the authors interpreted the 
results as supporting the superiority of maximum likelihood 
techniques, only 16 of the 72 comparisons were significantly 
different from a null proportion of 0.5 (constructing 95% 
confidence intervals around the reported proportions) . of 
these, 10 comparisons were between the pairwise deletion 
approach and the maximum likelihood approach. In the estimation 
of the regression parameter of the variable with missing values, 
only the comparisons between the listwise procedure and the ML 
procedure were reported. Of these 72 comparisons, none was 
significantly difierent from the null proportion. 

Kim and Curry (1977) generated 10 data sets of five 
variables, randomly deleted 10% of the observations on each 
variable, and compared pairwise and listwise deletion 
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approaches. The index of effectiveness of the missing data 
procedures was the deviation of the zero-order correlation 
coefficient from the total sample value. Th.rs deviation was 
averaged over the entire matrix. These researchers, in contrast 
to Haitovsky, found that pairwise deletions were superior to 
listwise. Moreover, the differences between estimated 
correlations and "true" correlations were only slightly greater 
than the sampling variation in the coefficients obtained from 
the complete samples. 

The contradictory results obtained by Kim and Curry (1977) 
and Haitovsky (1968) on the relative effectiveness of listwise 
de]etion and pairwise deletion may have little consequence in 
practical applications. Kim and Curry examined data generated 
from a single multivariate distribution and drew only 10 samples 
for their computations, and in neither study were the magnitudes 
of the effects tested for statistical significance. The later 
research by Basilevsky, Sabourin, Hum and Ander-son (1985) and 
Donner and Rosner (1982) suggest that the differences between 
these methods are small and nonsignificant even with much 
greater proportions of data missing. 

Comparisons Using Actual Field Data 

Guertin (1968) compared listwise deletion, mean 
substitution, and regression estimation in computing zero-order 
correlations between student grade point average and each of 10 
achievement tests. The listwise deletion method yielded the 
highest correlation coefficients on 28 of the 50 computed, and 
the regression estimate produced higher correlations than the 
mean substitution method on 34 of the 50 correlations. However, 
lacking criterion values for the correlations, decisions about 
which treatment yielded the most accurate correlation estimates 
cannot be made. 

Raymond (1987) analyzed field data in which missing values 
occurred. He compared listwise deletion, pairwise deletion, and 
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regression imputation of missing data on the resulting 
regression equation built from the data matrix. The data 
consisted of 230 cases with 12 variatles fct each case. One 
hundred seventy-four of the cases presented no missing data (76% 
of the cases complete} . 

The magnitudes of the value of obtained from pairwise 
deletion and from regression estimation were similar (0.291 and 
0.2 99, respectively), but both were notably less than that 
obtained from the listwise deletion (0.354) . Using a 
stepwise method of equation building, the pairwise method 
yielded only a four variable equation, while the other two 
methods entered five predictors. Such a difference in the 
resulting equations makes comparison of the individual 
regression weights across the methods misleading. Further, the 
comparison of the missing data treatment methods is inhibited 
because a criterion value of is not available- Raymond did 
r.Dt provide a cross-validation of the three regression equations 
within his research design, so an evaluation of the stability of 
the resulting equations was also not possible. 

Comparisons Incorporating Sampling Variability 

Raymond and Roberts (1987) compared four common missing 
data procedures: listwise deletion, mean substitution, simple 
regression estimation, and iterated regression estimation. 
Using computer generated multivariate normal datasets for three 
predictors and one criterion variable, the researchers compared 
the four techniques while manipulating sample size (with sizes 
of 50, 100, and 200) , and the percentage of missing data (2%, 
6%, and 10%). Additionally, the characteristics of the matrix of 
correlations among the four generated variables were manipulated 
to conform to matrices encountered in selection research. The 
efficacy of missing data treatments was indexed by two 
regression criteria, the deviation of R^ from that of the 
complete sample and the sum of regression weight deviations from 



those of the complete S2unple. 

The data were analyzed using analysis of variance. The 
experimental design crossed sample size, percent missing, and 
missing value treatment. The data matrix provided 30 
replications per cell. The datasets generated from each of the 
four correlation matrices were analyzed separately. Significant 
main effects for sample size and proportion missing were 
obtained, as expected (as sample size increases and as the 
proportion of missing data decreases, the effectiveness of any 
missing data treatment is improved) . Additionally, as expected, 
the researchers found that the four missing data procedures 
converged as the sample size increased and as the proportion of 
closing data decreased. This corresponds to the textbook 
prescription that if little data are missing, all methods are 
about equally effective. 

In general, the two regression estimation procedures 
(simple regression and iterated regression) were superior to the 
others and the mean substitution procedure was superior to the 
case deletion method. In addition to the accuracy of the 
estimates, the obtained variability of the estimates followed 
the same pattern, i.e., the regression estimates were the most 
consistent: and the case deletion method was the least 
consistent. Because case deletion is the typical default 
missing data treatment in multivariate software, the results of 
the Raymond and Roberts' study suggest that "ignoring the 
missing data" is not only not the best approach, it may be the 
worst approach. While the obtained effect was evident on either 
criterion measure of effectiveness, the differeiices between 
missing data treatments were more pronounced on the regression 
wei^nt criterion than on the overall magnitude of r2 . 

Basilevsky, Sabourin, Hum, and Anderson (1985) compared 
nine missing data treatments using computer generated 
multivariate normal data. In addition to several deletion and 
imputation methods, these authors included several estimation 
techniques based on a principal components analysis (originally 
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derived by Dear, 1959) . The latter techniques derive the 
largest principal components from the complete data matrix and 
use these components to estimate individual missing data 
elements. 

The researchers controlled sample size (two levels: n=60 
and n=600) , degree of collinearity among five predictors (three 
levels: .10, .50, and .90), predictability of the dependent 
measure (three levels: r2=.20, .50, and .90), and extent of 
missing data (three levels: 10%, 30%, and 50% missing). The 
fifth factor (the missing data treatment) was designed as a 
within-subjects factor giving a five-dimensional completely- 
crossed design. Ten replications (different computer generated 
samples) were produced for each cell. Three dependent measures 
were used to evaluate the effectiveness of the missing data 
treatments: deviation of obtained value of r2 from actual r2 , 
average regression weight deviation from the actual values, and 
difference in mean square error from that of the population'. 

Unfortunately, the results of the study are difficult to 
assess because of the way in which they were reported. The 
authors asserted that no significant interactions were obtained 
in the study but no ANOVA table was provid-^.d in their report - 
It is surprising that the convergence of methods when applied to 
larger samples and samples with greater degrees of completeness 
reported by Raymond and Roberts (1987) was not replicated on 
these data. Additionally, the results were reported in the form 
of Fisher LSD comparisons of each treatment with the treatment 
that gave the least degree of deviation from the complete sample 
values. The significance of the deviation of any missing data 
treatment method from the results obtained for the complete 
sample was not evaluated. Further, the results were reported as 
common logarithms, having been transformed from the original 
values for the analysis of variance. 

While this simulation sugge-Ls that the commonly used 
missing data treatments may be superior to more complex 
treatments and are certainly no worse, the artificial nature of 
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the simulated data limits its generality (the multivariate 
normal distribution used to generate the data had all pairwise 
zero-order correlations among the predictors equal to each 
other) . Further, the lack of clarity in the reported results 
limits the degree of confidence in the outcome. The research 
also failed to indicate how poorly anv estimate was, so 
conclusions about how much missing data is too much cannot be 
reached. 

summary 

The empirical comparisons of missing data treatments are o 
limited utility to the applied researcher, not because the 
results are contradictory (although such contradictions are 
evident in the research summarized here) , but because the 
critical concerns of the applied researcher have been 
inadequately addressed. 

First, the effectiveness of a missing data treatment must 
be evaluated against a criterion. The early report of Guertin 
(1968) and the more recent work of Raymond (1987) showed that 
different missing data treatments lead to different values for 
computed statistics but allow no method of determining which 
value of the statistic is closest to the truth (either the 
parameter being estimated or the sample value of the statistic 
which would have been obtained if no data were missing) . The 
work of Timm (1970), Donner and Rosner (1982), and Raymond and 
Roberts (1987) evaluated effectiveness in terms of deviation 
from a criterion, when such results are presented as ratios, 
the relative effectiveness of treatments can be addressed but ai 
index of the absolute effectiveness of any treatment is lost. 
Beale and Little (1975) reported results as direct deviations 
from a criterion without creating ratios, a presentation which 
allows the reader to judge the absolute effectiveness of a 
treatment for a given missing data situation. 

The choice of a criterion is a closely related issue. Timm 



(1970) and Gleason and Staelin (1975) used an index of agreement 
between correlation matrices as the criterion of effectiveness. 
In much applied research, the correlation matrix is only a 
preliminary step in the analysis. The use of regression 
coefficients and associated statistics as criteria more closely 
address the effects of treatments on statistics which are likely 
to be critical to a research project. Beale and Little *s (1975) 
criterion of the percent increase in SSji^^gid provides an index 
of the overall quality of a regression equation^ Most applied 
research, however, involves the interpretation of regression 
weights and magnitudes of . The effects of missing data 
treatments on these statistics may not be proportional to those 
observed on sums of squares , as suggested by the research of 
Basilevsky et al. (1985). 

Second, the manifest differences among outcomes must be 
compared with differences likely to arise due to chance. 
Although some researchers have reported tests of hypotheses 
regarding differences among treatments (Basilevsky et al., 1985; 
Raymond and Roberts, 1987) , no test of the significance of the 
difference between a treated matrix and the complete sample 
matrix has been reported. Kim and Curry (1977) reported that 
the estimated values obtained from their treated matrices were 
not notably discrepant from the variation in complete data 
statistics resulting from sampling. 

Finally, the treatments must be applied to realistic 
situations. Comparisons based on actual field data (Guertin, 
1968; Raymond, 1987) or on simulations based upon matrices 
obtained from field data (Timm, 1970; Raymond & Roberts, 1987) 
provide a better index of effectiveness than simulations based 
upon matrices not encountered in the real world (Basilevsky et 
al . , 1985) . 

In general, simulation studies have dealt with idealized 
data produced by random number generators following an exact 
mathematical model (typically the multivariate normal 
distribution) . Any actual field data naturally violates 
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distributional assumptions to some extent. Further, the studies 
have failed to provide answers to the critical questions facing 
applied researchers concerning the selection of treatments and 
the amount of missing data. An examination of the effects of 
missing data treatments with actual field data rather than with 
computer generated data provides a useful extension and test of 
the real-world applicability of the simulations. Attention to 
the significance of differences in results obtained from the 
incomplete samples from those obtained in the complete samples 
will potentially yield insights not suggested from the previous 
research . 

Method 

Bootstrap samples were drawn from three large sets of 
actual field data, representing three types of data commonly 
encountered in social research: achievement test data, opinion 
rating scales (Likert rating data) , and factor score scales 
(psychological trait data) . Each sample was analyzed as a two- 
predictor regression model. Descriptive statistics on the 
variables comprising the three sets of data are presented in 
Table 1, and the regression models computed on these pseudo- 
populations are presented in Table 2. 

From each data set 100 samples of size 50, 100, and 200 
were drawn with replacement. Within each of the samples of each 
size, a proportion of the observations were randomly selected 
and assigned missing values in lieu of the existing values of 
one predictor variable. One hundred samples were examined at 
each of six levels of missing data: 10%, 20%, 30%, 40%, 50%, and 
60% missing. Fin^iily, the 100 samples of each size were examined 
with no missing data. 

Treatments of the missing value data sets based upon 
listwise deletion, pairwise deletion, mean imputation, simple 
regression imputation, and multiple regression imputation were 
computed and the resulting regression parameters were compared 
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with those of the 100 samples with no missing data* The details 
of these MDTs have been extensively described elsewhere (e.g., 
Kim & Curry ^ 1977), and will not be repeated here* 

Data Analysis 

This experimental study represents a3X3X6X5 design, 
with two between-subjects factors (parent population and sample 
size) and two within-subjects factors (proportion of data 
missing and missing data treatment method) . The dependent 
variables analyzed were the sample estimates of r2 and each of 
the two standardized regression coefficients. The data were 
analyzed by computing the effect sizes obtained from the missing 
data treatment conditions relative to the complete sample 
condition J 

Results 

The cell means and standard deviations of the obtained 
values of R^, the regression weight of the variable with missing 
data and the regression weight of the variable without missing 
data are presented in Tables 3, 4, and 5, respectively* 

Effects of Missing Data on R^ 

The cell means for values of are presented in Figures 1, 
2, and 3. Three trends in these data are evident in the figures. 
First, the differences among the MDTs increase as the proportion 
of missing data increases, an effect which is anticipated based 
upon previous empirical research on randomly missing data (e.g. , 
Gleason & Staelin, 1975; Raymond & Roberts, 1987). Second, the 
use of larger sample sizes does not substantively ameliorate the 
effect of missing data on the estimates of r2. The effects of 
the missing data and their treatment are relatively stable 



Analyses of variance were computed on these data. Because of the large s»^le size, all 
effects and interactions were stattstically significant. To conserw^ space, the details 
of these analyses are not presented here, but are available upon request from the authors- 



across the sample sizes examined. Finally, differences in the 
effectiveness of the MDTs are evident. 

The use of multiple regression imputation consistently 
yields overestimates of r2. Conversely, the use of mean 
imputation consistently yields underestimates of r2. The simple 
regression imputation procedures overestimates r2 only in the 
psychological trait data, where the overestimation is 
consistent, in the other two sets of data, the simple regression 
imputation procedure underestimates R^, with the exception of 
the samples of size 50 in achievement data, where a slight 
overestimation is evident. The use of listwise deletion 
typically overestimates r2 (in some exceptions, the listwise 
procedure underestimates the value of r2 , but the effect is very 
small). The pairwise deletion procedure shows a similar 
overestimation of r2 , with instances of underestimation. 
Although no MDT yields consistently best estimates of r2 across 
the data sets, sample sizes, and levels of missing data 
examined, the listwise and pairwise deletion procedures perform 
better in most situations than the use of mean imputation and 
the two regression imputation techniques. 

Effects on Beta for the Variable with Missing Data 

The cell means obtained for the values of the regression 
weight of the variable with missing data are plotted in Figures 
4, 5, and 6. The divergence in the resulting values attributed 
to the MDTs that was evident in the values of r2 is also evident 
in the values of these regression weights. The use of multiple 
regression imputation consistently overestimates this regression 
weight, and the use of mean imputation consistently 
underestimates it. Simple regression yields an overestimate in 
the psychological trait data, an underestimate in the 
achievement data, and virtually no effect in the Likert rating 
data. The pairwise deletion procedure yields inconsistent, and 
small overestimates or underestimates. The listwise deletion 
procedure yields a small, consistent underestimate of this 
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regression coefficient in the achievement data, and a small but 
inconsistent effect in the psychological trait data and Likert 
rating data. 

Effects on Beta for the Var iable without Missing Data 

The cell means obtained for the values of the regression 
weight of the variable without missing data are presented in 
Figures 7, 6, and 9. The divergence of values evident with 
increases in the amount of missing data is also apparent in 
these figures. However, the direction of effects obtained from 
the missing data procedures are the opposite of those obtained 
for the other regression weight. The use of multiple regression 
imputation underestimates this regression weight, and the use of 
mean imputation overestimates it. Simple regression yields an 
underestimate in the psychological trait data, and small, 
inconsistent effects in the achievement data and Likert rating 
data. The pairwise deletion yields a small underestimate in the 
Likert rating data, but the direction of the effect is 
inconsistent in the achievement data and in the psychological 
trait data. The listwise deletion procedure yields small effects 
(inconsistent in direction) in all three types of data. 



As an aid to interpretation of these data, the cell means 
were transformed to effect sizes, according to the formula: 



Discussion 



Ehijk 



(^hijk " ^hOOk) 



'^hOOk 




the effect size in data set h, for missing data 
treatment i, with proportion of missing data j 
and sample size k. 



hijk 



the obtained mean in data set h, for missing 
data treatment i, with proportion of missing 
data j and sample size k. 



hOOk 



the obtained mean in data set h, for the 100 
samples of size k with no missing data. 
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^hook " t*^® obtained standard deviation in data set h, 

for the 100 samples of size k with no missing data. 

The effect sizes for the values of r2, and each regression 
weight are presented in Tables 6, 7, and 8. To summarize the 
results obtained for the three sets of data, the obtained effect 
sizes were classified as significant or non-significant, from a 
practical perspective, on the basis of their magnitude. Effect 
sizes with absolute values less than 0.3 were considered to 
present no practical problem for the researcher, and those with 
effect sizes greater than or equal to 0.3 (in absolute value) 
were considered large enough to distort the interpretation of 
the regression. The criterion of 0.3 is somewhat more conservative 
than the 0.5 value recommended by Light and Pillmer (1984) in 
their consideration of "noticeable" effects. The more 
conservative criterion is recommended because, in contrast to 
the context in which Light and Pillmer were working, the 
regression parameters are likely to be subject to both a 
substantive interpretation and a test of statistical 
significance. A summary of this analysis of effect sizes is 
presented in Table 9. 

In this table, the differences in performance of the 
missing data treatments is particularly evident. Specifically, 
the use of the mean substitution provided effect sizes greater 
than 0.3 in 61% of the situations examined in the estimation of 
r2, in 93% of the situations in the estimation of the regression 
weight for the missing data variable, and in 78% of the 
situations in the estimation of the regression weight for the 
variable with no missing data. 

A more detailed examination of the performance of the mean 
substitution technique shows substantial variations according to 
the data set analyzed. In the achievement data, only 17% of the 
estimates of r2 exceeded the 0.3 effect size limit, but 94% of 
the estimates of each regression weight exceeded this limit. In 
the psychological trait data, 72% of the r2 estimates exceeded 
the limit, as did 83% of the estimates of the regression weight 
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for the variable with missing data. However, only 39% of the 
estimates of the regression weight for the variable without 
missing data exceeded the limit. Finally, in the Likert rating 
data, 94% of the r2 estimates exceeded the effect size limit, as 
did all of the regression weight estimates (for both the 
variable with missing data and the variable without missing 
data) . 

Similarly, most of the effect sizes obtained with the use 
of the multiple regression imputation technique exceeded the 0.3 
criterion. Thirty-nine percent of the effect sizes for r2 
exceeded 0.3 in the achievement data, 61% percent in the Likert 
rating data, and 83% in the psychological trait data. More than 
two-thirds of the regression weight effect sizes exceeded 0.3. 

The simple regression procedure performed inconsistently in 
this analysis. None of the estimates of r2 exceeded the effect 
size limit of 0.3 in the achievement data, but 28% exceeded this 
limit in the Likert rating data, and 78% exceeded the limit in 
the psychological trait data. In estimating the regression 
weight for the predictor with missing data, none of the 
estimates for the Likert data and only ii% of the estimates for 
the achievement data exceeded the effect size limit. However, 
83% of the estimates for the psychological trait data exceeded 
this limit. The performance of the simple regression imputation 
procedure was better for estimation of the regression weight of 
the variable without missing data. For this regression weight, 
28% of the estimates exceeded the effect size limit in the 
psychological trait data and none of the estimates exceeded the 
limit for the other two types of data. 

The two deletion procedures yielded more accurate estimates 
of r2 and both regression parameters than any of the imputation 
procedures. In the estimation of r2, both the pairwise deletion 
approach and the listwise deletion approach yielded estimates 
beyond the 0.3 effect size limit in only 4% of the situations 
examined. In the estimation of regression weights the listwise 
procedure performed slightly better than the pairwise deletion 



procedure. None of the estimates of the regression weights 
exceeded the 0.3 effect size limit with the listwise deletion 
approach, and only 4% of the estimates exceeded this limit with 
the pairwise approach. However, the missing data situations in 
which estimates of r2 and regression weights exceeded these 
limits were those in which at least 50% of the data were 
missing. For missing data conditions less severe than 50% 
missing, neither deletion procedure yielded effect sizes greater 
than 0.3. 

Although of less concern than bias in the estimates 
resulting from missing data and their treatment, differential 
increases in the sampling variability of the parameter estimates 
are also evident in these data. To assist in the interpretation 
of these effects, ratios of the standard deviations of each 
regression statistic, relative to the standard deviation 
obtained from the complete data samples were computed. 

SD Ratioj^ijK = : 



'^hOOk 



where SD Ratiohi-ik = the standard deviatipn ratio i:i 
wnere c^u Kcivxut^i-jK ^^^^ ^ f^j. missing data 

t;:eat:ment i, with proportioin of 
missing data :) and sample size k. 

^>,^-iv = the obtained stan^arc^ deviation in 
"^J^ data set h, for missing data 

treatment i, with proportion of 
missing data D and sample size k. 

^v,nnv = the obtained standard deviation in 
hoOk h. for the 100 samples of 

size k witn no missing data. 

The standard deviation ratios for the estimates of R^ , the 
regression for the variable with missing data, and the 
regression for the variable without missing data are presented 
in Tables 10, 11, and 12, respectively. The largest increases in 
the variability of r2 are evident with the listwise deletion 
procedure and the two regression imputation techniques. As 
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anticipated, the variability increases with the proportion of 
missing data. At the most extreme (simple regression imputation 
with samples of psychological trait data) , the standard 
deviation of r2 is twice as large as that obtained with complete 
data samples. 

The increases in variability of the regression weights 
(Tables 11 and 12) are larger in magnitude than those associated 
with r2 (Table lo) , in some instances (i.e., multiple regression 
imputation in the samples of achievement data) becoming three 
times as large as the variability in the complete data samples. 
The only missing data treatment th^t is not associated with 
increases in variability is the mean imputation technic[ue. 
However, the extent of bias evident with the mean imputation 
procedure renders its resistance to variability inflation of 
secondary importance. 

In conclusion, the three imputation techniques examined in 
this study (multiple regression imputation, simple regression 
imputation, and mean imputation) did not perform well when 
applied to situations of actual field data presenting randomly 
missing values. Even with as little as 10% data missing, the 
imputation procedures can yield biased estimates with effect 
sizes greater than 0.3. An exception may be evident for the 
simple regression imputation procedure when the correlation 
between the predictors is very high, and the focus is on the 
magnitude of regression weights rather than on r2 . However, 
caution should be taken in using this technique. In situations 
where the simple regression imputation procedure was 
ineffective, the resulting values were extremely biased. 

In contrast, the deletion procedures appear to yield 
results that are not appreciably different from those obtained 
in sets of data without missing data. Furthermore, the 
effectiveness of the deletion procedures are maintained 
throughout the range of missing data examined in this study. 
Even when more than half of the data are missing, the deletion 
procedures typically yield accurate estimates of r2 and 

-22- 

Er|c or. 



regression weights. The increase iu the variability of the 
statistics evidenced in the missing data analyses implies that 
the researcher should make an adjustment to standard errors when 
testing hypotheses and constructing confidence intervals, l^en 
the level of missing data reaches 3 0%, an increase in the 
standard error of approximately 50% should provide a 
conservative adjustment for this increase in variability. 

Without suggesting that applied researchers become 
complacent about missing data problems, this research provides 
empirical support for the use of certain MDTs and for the 
avoidance of other MDTs. The differences in the effectiveness of 
the treatments across the three types of data examined in this 
study highlight the need for further research to identify the 
types of data matrices that may be amenable to analysis by these 
MDTs. Equally important to the general izability of the results 
are the consideration of regression models with more predictor 
variables, and matrices in which missing data occur on more than 
one predictor. Finally, additional research on variations in the 
nature of the missing data mechanism (i.e., nonrandomly missing 
data, Kromrey & Hines, 1990) , will provide empirical support for 
the use of MDTs in situations for which the critical assumption 
of randomness is untenable. 
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Table 1 

SuRwarv PMCriotWe Statistics for tha Pseudo-Pooutations. 

Achieve^t Data (N>1000} 



Scale M«an SO Skewoess ICurtosU Reliability^ 


Correlations 


Matti«matfcs 710.17 25.8fi -0.01 2.02 0.96 
Reading 725.09 5 7.27 - 0.41 0.50 0.96 
language 707.77 42.18 -0,50 1.86 0,94 


Hatheaatics Reading 
Reading 0.69 
Language 0.75 O.S'^ 


Likert Rating Data CN=618} 


Scale Mean SO Skewness Kurtosis Rel iabi li ty^ 


Correlations 


Relatecfriess 3.75 0.74 -0.50 0.17 0.84 
Inportance for 

Certification 3,75 0.76 -0.53 0.15 0.89 
Frequency 

of Use 3-64 0.73 - 0.47 0.09 0.83 

1 


loportance for 
Job-Relatedness Certif icat ion 

Ifiportance for 
Certification 0.83 

Frequency of Use 0.90 0.79 


Psychological Trait Data (W=908) 


Scale Mean SD Skewness Kurtosis Reliability^ 


Correlations 


Self- 
Anxiety 50,00 10.00 -1.26 0.82 0,81 

Parental 

Anxiety 50.00 10.00 -0.04 -0.12 0.80 
Parent-Chi Id 

Anxiety 50.00 10.00 -0.42 0,11 0.72 


Self-Anxiety Parental Anxiety 

Parental 

Anxiety 0.22 
Parent-Chi Id 

Anxiety 0.44 0.38 



^he reported reliability is the ICR-20 coefficient. 
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Table 2 

Regression Models Evaluated in the Study as Computed on the 
Pseudo-populations from which Samples were Drawn 



Dependent 

Data Set Variable Predictors Beta 



Achievement Math * Reading 0.2338 0.5891 

Data Score Score 

Language 0.5673 
Score 



Psychological 

Trait 

Data 



Parent- 
Child 
Anxiety 



Self- 
Anxiety 

Parental 
Anxiety 



0. 3353 



0.3060 



0. 2515 



Likert Importance * Job- 0.6263 0.6990 

Rating for Relatedness 

Data Certification 

Frequency 0.2258 
of Use 



Note . The predictor with missing values is coded with an 
asterisk. 



Table 3 

Cell Means and Standard Deviations of in Three Types of Field Data . 



Achievcflient Data 



Psychological Trait Data 



Likert Rating Data 



Missing 
Sanple Data 
Size Treatment 



Percent of Data Missing 



Percent of Data Missing 



Percent of Data Missing 



0% 



10X 



20% 



30% 



40X 



50% 



60% 



50 
50 
50 
50 
50 



100 
100 
100 
100 
100 



200 
200 
200 
200 
200 



Mean 
Sub 



SO 



Siffpie MN 
Reg SD 

Multiple MN 
Reg SD 

Listwise MN 
Deletion SO 

Pairwise MN 
Deletion SD 



Mean 
Sub 



MN 

SO 



Siinple MN 
Reg SD 

Multiple MN 
Reg SD 

Listwise MN 
Deletion SD 

Pairwise MN 
Deletion SO 



Mean 

Sue, 



MN 

SD 



Simple MN 
Reg SD 

Multiple MN 
Reg SO 

listwise MN 
Deletion SD 

Pairwis*" MN 
Oeleti 



0.605 
0.100 

0.605 
0.100 

0.605 
0.100 

0.605 
0.100 

0.605 
0.100 



0.602 
0.101 

0.604 
0.103 

0.609 
0.101 

0.606 
0.110 

0.606 
0.101 



0.601 
0.103 

0.605 
0.103 

0.613 
0.100 

0.609 
0.104 

0.609 
0.100 



0.598 
0.100 

0.605 
0.104 

0.622 
0.100 

0.6U 
0.115 

0.6U 
0.100 



0.595 
0.104 

0.610 
0.116 

0.633 
0.101 

0.609 
0.127 

0.619 
0.099 



0.59/ 
0.102 

0.613 
0.119 

0.641 
0.109 

0.629 
0.136 

0.622 
0*101 



0.595 
0.103 

0-615 
0.124 

0.659 
0.123 

0.618 
0.150 

0.623 
0.104 



0.591 
0.065 

0.591 
0.065 

0.591 
0.065 

0.591 
0.065 

0.591 
0.065 



0.588 
0.066 

0.588 
0.065 

0.593 
0.065 

0.594 
0.068 

0.592 
0.065 



0.585 
0.065 

0.586 
0.065 

0.595 
0.067 

0.596 
0.079 

0.592 
0,067 



0.581 
0.065 

0,586 
0.068 

0.602 
0.070 

0.599 
0.080 

0.595 
0.067 



0.581 
0.065 

0.587 
0.071 

0.605 
C.070 

0.604 
0.090 

0.596 
0,067 



0.580 
0.065 

0.590 
0.078 

0.617 
0.078 

0.614 
0,093 

0.601 
0.070 



0.578 
0,067 

0.601 
0.091 

0.631 
0.086 

0.601 
0.108 

0.603 
0.071 



OX 



10% 



20% 



30% 40% 



50% 



60% 



0.257 
0.094 

0.257 
0.094 

0.257 
0.094 

0.257 
0.094 

0.257 
0.094 



0.243 
0.097 

0.263 
0.099 

0.263 
0.099 

0.261 
0,100 

0,253 
0-098 



0.241 
0.094 

0.285 
0.103 

0.286 
0-103 

0.266 
0.104 

0.264 
0.098 



0.222 
0.096 

0.294 
0.120 

0.296 
0.119 

0.257 
0.113 

0.259 
0.107 



0.212 
0.096 

0.304 
0. 129 

0.312 
0.129 

0,271 
0.124 

0.258 
0.107 



0.208 
0.103 

0.351 
0.160 

0.354 
0J55 

0.282 
0.144 

0.274 
0.126 



0.200 
0.094 

0.385 
0.177 

0.413 
0.174 

0.315 
0.156 

0.286 
0.117 



0.587 
0.043 

0.587 
0.043 

0.587 
0.043 

0.587 
0.043 

0.587 
0.045 



0.583 
0,042 

0.586 
0.042 

0.590 
0.043 

0.589 
0.045 

0.588 
0.043 



0.580 
0.043 

0,583 
0.043 

0.592 
0.045 

0.586 
0.050 

0.587 
0.044 



0.575 
0.041 

0.581 
0.044 

0.598 
0.044 

0.589 
0.053 

0.589 
0.043 



0.573 
0.042 

0,578 
0.043 

0.602 
0.051 

0.593 
0.059 

0.589 
0.046 



0,572 
0.041 

0.577 
0.043 

0.609 
0.055 

0.579 
0.066 

0.588 
0.046 



0-571 
0.042 

0.577 
0.045 

0.618 
0.058 

0.586 
0.073 

0.589 
0.045 



0.256 
0.074 

0.256 
0,074 

0.256 
0.074 

0.256 
0.074 

0.256 
0.074 



0.243 
0.071 

0.267 
0.078 

0.266 
0.077 

0.256 
0.076 

0.256 
0,074 



0.235 
0.069 

0.286 
0.080 

0.283 
0.081 

0.261 
0.078 

0.259 
0.075 



0.223 
0.070 

0.301 
0.084 

0.298 
0.085 

0.261 
0.085 

0.259 
0.076 



0.205 
0.069 

0.306 
0.095 

0.300 
0.093 

0.252 
0.095 

0.248 
0.081 



0.200 
0.074 

0.345 
0.127 

0.341 
0.126 

0.270 
0.119 

0.260 
0.096 



0.191 
0.066 

0.377 
0.148 

0.385 
0.141 

0.276 
0.123 

0.267 
0.093 



0.262 
0.055 

0.262 
0,055 

0.262 
0.055 

0.262 
0.055 

0.262 
0.055 



0.249 
0.055 

0.272 
0.057 

0.271 
0.058 

0.260 
0.057 

0,261 
0.056 



0.239 
0.054 

0-288 
0.062 

0.286 
0.063 

0.262 
0.060 

0.262 
0.058 



0.229 
0.054 

0.307 
0.067 

0.304 
0.068 

0.266 
0.065 

0.265 
0.060 



0.214 
0.053 

0.329 
0.072 

0.324 
0.072 

0.264 
0.068 

0.263 
0.061 



0.204 
0.055 

0.356 
0.088 

0.350 
0,090 

0.265 
0,078 

0.264 
0,070 



0.196 
0.056 

0.410 
0.103 

0.401 
0.107 

0.274 
0.0^ 

0.275 
0.076 



0% 



10% 



20% 



30% 40% 50% 60% 



0.708 
0.098 

0.708 
0.098 

0.708 
0.098 

0.708 
0.098 

0.708 
0.098 



0.689 
0.101 

0.707 
0.105 

0.718 
0.102 

0.713 
0.105 

0.715 
0.103 



0.672 
0,103 

0.697 
0.106 

0.720 
0.099 

0.710 
0.105 

0.715 
0.101 



0.662 
0.108 

0.699 
0.113 

0.735 
0.104 

0.708 
0.124 

0.716 
0.113 



0.657 
0.105 

0.705 
0.124 

0-748 
0.110 

0.716 
0.132 

0.724 
0.113 



0.651 
0J07 

0.702 
0.130 

0.759 
0,122 

0.703 
0.158 

0.730 
0.129 



0.650 
0.109 

0.700 
0.141 

0.783 
0-118 

0.717 
0.148 

0.747 
0.125 



0.707 
0.075 

0.707 
0.075 

0.707 
0.075 

0.707 
0.075 

0.707 
0.075 



n.680 
O.075 

0.703 
0.076 

0.713 
0.075 

0.706 
0.078 

0.706 
0.076 



0.668 
0.077 

0.696 
0.083 

0.724 
0.077 

0.714 
0.082 

0.714 
0.080 



0.659 
0.079 

0.690 
0.079 

0.732 
0.077 



0.651 

0,079 

0.688 
0.095 

0.744 
0.081 



0.707 0.712 
0,093 0.096 



0,712 
0.083 



0.715 
0.083 



0.648 
0.080 

0.696 
0.106 

0.766 
0.083 

0.718 
0.106 

0.725 
0.098 



0.696 
0.056 

0-696 
0-O56 

0.696 
0.056 

0.696 
0.056 

0.696 
0.056 



0.669 
0.055 

0.689 
0.056 

0.704 
0.055 

0.697 
0.056 



0.655 
0.058 

0.681 
0,057 

0.711 
0.054 

0.698 
0.058 



0.647 
0.061 

0.677 
0,059 

0.720 
0.058 

0.698 
0.069 



0.638 
0.062 

0.671 
0.065 

0.732 
0.063 

0.695 
0.071 



0.697 0,699 
0.055 0.057 



0,698 0.698 
0.064 0.065 



0.636 
0.062 

0.669 
0.073 

0.755 
0.069 

0.702 
0.07V 

0.706 
0.069 



0.645 

0.680 
0.101 

0.770 
0.085 

0.714 
0.105 

0.726 
0.097 

0.631 
0.064 

0.668 
0.083 

0-767 
0.070 

0.699 
0.082 

0.704 
0.073 
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Table 4 

CeU Means gfxi Standard Peviotions of Beta for the Varfnblc With Misi;inq Oata in Three Types of Field Data . 



Achievement Data 



Hissing 
Safnple Data 
Size Treatmeft. 



Percent of Data Missing 



Psychological Trait Data 
Percent of Oata Missing 



Likert Rating Data 



Percent of Data Hissing 



0% 



10% 



20% 



30% 



40% 



50% 



60% 



50 
50 
50 
50 
50 



m 

100 
100 
100 
100 

zoo 

200 
200 
200 
200 



Mean 
Sub 



HH 

SO 



Simple MM 
Reg SD 

Multiple MN 
Reg SD 

LUtwIse MH 
Deletion SD 

Pa irwise MN 
Deletion SD 



0.211 
0.163 

0.211 
0.163 

0.211 
0.163 

0.211 
0.163 

0.211 
0.163 



0.165 
0.155 

0.21? 
0.169 

0.234 
0.184 

0.213 
0.167 

0.214 
0.169 



0.141 
0.147 

0.207 
0.178 

0.248 
0,207 

0.210 
0.175 

0.213 
0.192 



0.113 
0.144 

0.200 
0.211 

0.272 
0.285 

0.207 
0.214 

0.221 
0.248 



0.087 
0.124 

0.209 
0.233 

0.317 
0.326 

0.209 
0.211 

0.209 
0.269 



0.085 
0.133 

0.212 
0,270 

0.323 
0.412 

0.204 
0.254 

0.233 
0,294 



0.074 
0.117 

0.205 
0.288 

0.358 
0.556 

0.189 
0.273 

0.193 
0.307 



Mean 
Sub 



MN 

SD 



Simple MN 
Reg SO 

Multiple MN 
Reg SO 

Listwise MN 
Deletion SD 

Pa irwise MN 
Deletion SO 



0.215 
0.117 

0.215 
0.117 

0.117 

0-215 
0.117 

0.215 
0.117 



0.17^ 
0.114 

0.209 
0.121 

0.232 
0.135 

0.212 
0,123 

0.220 
0.132 



0.141 
0.106 

0.197 
0.122 

0,246 
0.149 

0.204 
0.124 

0.213 
0.135 



0.112 
0.095 

0.198 
0.143 

0.281 
0.194 

0.210 
0.147 

0.223 
0.156 



0.097 
0.099 

0.189 
0.160 

0.501 
0.221 

0.196 
0.153 

0.218 
0.170 



0.090 
0.090 

0,204 
0.187 

0.361 
0.280 

0.208 
0.166 

0.243 
0.184 



0.064 
0.092 

0.211 
0.237 

0.375 
0.374 

0.204 
0.206 

0.214 

0.224 



Mean 
Sub 



MN 

SO 



Simple MN 
Reg SD 

Multiple MN 
Reg SD 

Listwise MN 
Deletion SD 

Pairwise MN 
Deletion so 



0.237 
0.094 

0,237 
0.094 

0.237 
0.094 

237 
U.094 

0.237 
0.094 



0.191 
0.090 

0.235 
0.101 

0.260 
0.108 

0.239 
0.101 

0.241 
0.101 



0.160 
0.089 

0.226 
0,104 

0.275 
0.119 

0.233 
0.105 

0.232 
0.108 



0.120 
0.079 

0.222 
0.108 

0.314 
0.141 

0.235 
0.110 

0.238 
0.121 



0.096 
0.070 

0.206 
0.121 

0.339 
0.192 

0^223 
0. 130 

0.235 
0.134 



0.085 
0.071 

0.216 
0.130 

0.394 
0.219 

0.2S6 
0.141 

0.221 
0.145 



0.065 
O.068 

0.207 
0.150 

0.441 
0.277 

0.226 
0.154 

0.219 
0.156 



0% 



10% 



20% 



30% 



40% 



50% 



60% 



0.324 
0.111 

0.324 
0.111 

0.324 
0.111 

0.324 
0.111 

0.324 
0.111 



0.298 
0.113 

0.330 
0.123 

0.331 
0.121 

0.313 
0.119 

0.317 
0.116 



0.296 
0.105 

0.360 
0.132 

0.365 
0.127 

0.331 
0,118 

0.335 
0.118 



0,258 
0.112 

0.563 
0.161 

0.574 
0.153 

0.518 
0.136 

0.517 
0.137 



0.241 
0.104 

0.375 
0.169 

0.398 
0.161 

0.314 
0.138 

0.320 
0.135 



0.225 
0.112 

0.424 
0.197 

0.447 
0,180 

0.325 
0.150 

0.334 
0.160 



0.204 
0.119 

0.440 
0.253 

0.485 
0.287 

0,323 
0.190 

0.345 
0.190 



0.335 0.315 

0.080 0.073 

0.335 0.353 

0.080 0.061 



0.335 
0.080 

0.335 
0.080 

0.335 
0.080 



0-352 
0.082 

0.334 
0,078 

0.335 
0.078 



0.299 
0.080 

0.377 
0.093 

0.375 
0.093 

0.337 
0.087 

0.339 
0.087 



0.276 
0.080 

0.395 
0.106 

0.395 
0.103 

0.336 
0..095 

0.337 
0.092 



0.239 
0.078 

0.400 
0.116 

0.399 
0.107 

0.313 
0.096 

0.519 
0.095 



0.226 
0.086 

0,438 
0.154 

0.447 
0.142 

0.325 
0.117 

0.331 
0.119 



0.205 
0.082 

0.464 
0.189 

0.494 
0.164 

0.353 
0.129 

0.339 
0.151 



0.341 
0.058 

0.341 
0.058 

0.341 
0.058 

0.341 
0.058 

0.341 
0.058 



0.319 
0.058 

0.357 
0.061 

0.355 
0.063 

0.339 
0.061 

0,339 
0.061 



0.303 
0.056 

0.379 
0.065 

0.376 
0.067 

0.541 
0.063 

0*341 
0.062 



0.283 
0.061 

0.402 
0.080 

0.401 
0.079 

0.342 
0.071 

0.545 
0.071 



0.256 
0.055 

0.451 
0.080 

0.428 
0.081 

0.559 
0.069 

0.540 
0.069 



0.252 
0.057 

0.461 
0.096 

0.459 
0.092 

0.358 
0.079 

0.540 
0.077 



0.213 
0.060 

0.520 
0.106 

0.519 
0.105 

0.552 
0.091 

0.555 
0.088 



0% 



10% 



20% 



50% 



40% 



50% 60% 



0.616 
0.240 

0.616 
0.240 

0,616 
0.240 

0.616 
0.240 

0.616 
0.240 



0.416 
0.257 

0,612 
0.262 

0.666 
0.279 

0.615 
0.260 

0.655 
0.505 



0.295 
0.173 

0.609 
0.259 

0.721 
0.289 

0.614 
0.256 

0.645 
0.548 



0.225 
0.152 

0.610 
0.295 

0.778 
0.556 

0.620 
0.290 

0.625 
0.571 



0.165 
0.144 

0.603 
0.520 

0.857 
0.585 

0.599 
0.321 

0.626 
0.408 



0.119 
0.125 

0.607 
0.558 

0.968 
0.451 

0.610 
0.544 

0.620 
0.6O5 



0.112 
0.114 

0.646 
0.408 

1.106 
0.656 

0.647 
0.591 

0.765 
0.719 



0.620 
0.195 

0.620 
0.195 

0.620 
0.195 

0.620 
0.195 

0.620 
0.193 



0.405 
0.144 

0.614 
0.197 

0.662 
0.201 

0.619 
0.198 

0.611 
0.191 



0.299 
0.124 

0.621 
0.204 

0.747 
0.229 

0.650 
0.206 

0.659 
0.241 



0.225 
0.110 

0.605 
0.204 

0.799 
0.246 

0.616 
0.205 

0.628 
0.264 



0.167 
0.101 

0.615 
0.226 

0.900 
0.274 

0.627 
0.223 

0.640 
0.277 



0.142 
0.080 

0,657 
0.246 

1.020 
0.279 

0.648 
0.259 

0.687 
0.546 



0.108 
0.079 

0.587 
0.295 

1.102 
0.458 

0.598 
0.292 

0.676 
0.487 



0.628 
0.129 

0.628 
0.129 

0.628 
0.129 

0.628 
0.129 

0.628 
0.129 



0.408 
0.115 

0.625 
0.157 

0.682 
0.145 

0.650 
0.158 

0.652 
0.147 



0.294 
0.095 

0.610 
0.140 

0.737 
0.160 

0.622 
0.141 

0.655 
0.172 



0*252 
0.077 

0.612 
0.157 

0.8O2 
0.168 

0.628 
0.159 

0.631 
0.165 



0.170 
0.066 

0.615 
0.159 

0.909 
0.188 

0.656 
0.158 

0.657 
0.202 



0.140 
0.072 

0.655 
0.185 

1.041 
0.221 

0.660 
0.186 

0.655 
0.256 



0.108 
0.052 

0.608 
0.181 

1.151 
0.256 

0.652 
0.174 

0.644 
0.248 



ERIC 



ge^ Means and Starxjard Deviations of Beta for the Variable Without Missing Data m Three Typefe of Field Data . 



AchievcfTient Data 



Psychological Trait Data 



Likert Rating Data 



Missing 
Sample Data 
SUe Treatment 



Percent of O^ta Missing 



Percent of Data Missing 



Percent of Data Missing 



OX 



10% 



20% 



30X 40% 50% 



60% 



0% 



10% 



20% 



30% 40% 



50% 



60% 



50 


Mean 


MM 


0. 


589 


0. 


634 


0. 


659 


0.683 


0.706 


0,711 


0. 


726 


0. 


289 


0. 


295 


0.303 


0.314 


0. 


317 


0. 


325 


0.329 


0.234 


0.455 


0.552 


0.6Z3 


0-677 


U- 718 


U- 728 




Sub 


SO 


0. 


159 


0. 


146 


0. 


137 


0.121 


0.105 


G.109 


0. 


090 


0. 


146 


0. 


145 


0.148 


0.143 


0. 


149 


0. 


149 


0.153 


0.243 


0.239 


0.178 


0,150 


0-145 


0-111 


0.109 


50 


Sfirple 


MN 


0. 


589 


0. 


584 


0. 


588 


0,592 


0.584 


0.577 


0. 


584 


0. 


289 


0. 


285 


0,281 


0.276 


0. 


271 


0, 


260 


0.250 


0,234 


0,235 


0,231 


0.230 


0-239 


0.251 


0.188 




Reg 


SO 


0. 


159 


0. 


164 


0. 


169 


0.196 


0.213 


0.245 


0. 


257 


0, 


146 


0. 


145 


0.148 


0.141 


0. 


146 


0. 


145 


0.163 


0,243 


0,262 


0.258 


0.290 


0-307 


0-517 


0-395 


50 


Multiple 


MN 


0. 


589 


0. 


566 


0. 


553 


0.531 


0.489 


0.475 


0. 


439 


0. 


289 


0. 


283 


0.276 


0.269 


0. 


256 


0. 


249 


0.238 


0.254 


0.185 


0.128 


0.074 


-0-005 


-0.116 


-0-252 




Reg 


SD 


0- 


159 


0, 


177 


0. 


194 


0.261 


0,307 


0.380 


0,518 


0. 


146 


0. 


146 


0.153 


0.149 


0. 


167 


0, 


175 


0-245 


0.245 


0,278 


0.292 


0.554 


0-576 


0,455 


0-651 


50 


Listwise 


MN 


0. 


589 


0. 


586 


0, 


592 


0.595 


0.590 


0.600 


0. 


604 


0. 


289 


0. 


299 


0.285 


0.275 


0. 


299 


0. 


289 


0-307 


0.254 


0,239 


0.236 


0.227 


0-250 


0-224 


0.197 




Deletion 


SD 


0, 


159 


0, 


(67 


0. 


165 


0.200 


0.196 


0.246 


0. 


260 


0. 


146 


0. 


154 


0.167 


0.178 


0. 


182 


0. 


200 


0.231 


0.245 


0,260 


0.261 


0,291 


0-529 


0-346 


0-597 


50 


Pairwise 


MN 


0. 


589 


0. 


586 


0. 


588 


0.581 


0.592 


0.568 


0. 


603 


0. 


289 


0. 


288 


0.290 


0.295 


0. 


289 


0. 


292 


0-285 


0-254 


0.216 


0,205 


0.227 


0-224 


0-225 


0-086 




Delet ion 


SO 


0. 


159 


0. 


161 


0. 


182 


0,226 


0.244 


0.259 


0, 


267 


0. 


146 


0. 


145 


0.150 


0.145 


0. 


158 


0. 


163 


0.181 


0.245 


0.502 


0-347 


0.558 


0-589 


0,568 


0-706 


100 


Mean 


MN 


0. 


582 


0, 


616 


0. 


653 


0.678 


0.694 


0^703 


0. 


721 


0. 


298 


0. 


305 


0.313 


0.321 


0. 


330 


0. 


335 


0,343 


0. 233 


0.446 


0-551 


0.624 


0-676 


0,7u3 


n TTT 

0- A 33 




Sub 


SO 


0- 


102 


0. 


097 


0. 


088 


0.078 


0.072 


0.069 


0, 


057 


0. 


090 


0. 


089 


0.095 


0.091 


0. 


092 


0. 


090 


0.096 


0.196 


0.150 


0.127 


0.103 


0-098 


0.081 


0-078 


100 


Simple 


HN 


0. 


582 


0, 


583 


0. 


590 


0.586 


0.592 


0,577 


0. 


574 


0, 


298 


0. 


292 


0.286 


0.281 


0. 


278 


0. 


262 


0.255 


0-233 


0.234 


0.221 


0.233 


0.219 


0.2O0 


0.239 




Reg 


SD 


0. 


102 


0. 


109 


0. 


111 


0.128 


0,138 


0.161 


0. 


193 


0, 


090 


0. 


088 


0,092 


0.091 


0, 


092 


0. 


091 


0.106 


0.196 


0.200 


0.206 


0.204 


0-220 


0-238 


0-291 


100 


Multiple 


MN 


0, 


582 


0. 


564 


0. 


550 


0.517 


0.494 


0,438 


0. 


425 


0. 


298 


0. 


291 


0.282 


0.274 


0. 


268 


0. 




0.233 


0.233 


0.192 


0-106 


0.055 


'0-046 


-0, 162 


'0, 250 




Reg 


SD 


0. 


102 


0. 


121 


0. 


130 


0.169 


0.195 


0,248 


0. 


327 


0. 


090 


0. 


090 


0.095 


0,095 


0, 


100 


0. 


111 


0.139 


0.196 


0.205 


0.232 


0.245 


0-275 


0.281 


0-436 


100 


Li stwise 


MN 


0. 


582 


0. 


586 


0. 


594 


0,589 


0.603 


0,599 


0. 


590 


0. 


298 


0. 


297 


0.299 


0.298 


0. 


303 


0. 


303 


0.297 


0.233 


0.232 


0.225 


0.236 


0.225 


0.206 


0.254 




Deletion 


SD 


0. 


102 


0. 


105 


0. 


113 


0.134 


0.143 


0,152 


0. 


186 


0. 


090 


0. 


094 


0,102 


0.106 


0, 


122 


0. 


139 


0.160 


0-196 


0.205 


0.211 


0.208 


0-228 


0-251 


0-300 


100 


Pairwise 


MN 


0. 


582 


0. 


577 


0. 


583 


0.574 


0.578 


0.557 


0. 


582 


0. 


298 


0. 


297 


0.297 


0.299 


0. 


303 


0. 


299 


0.301 


0.235 


0.242 


0. 194 


0.224 


0-212 


0,1 6y 


0, 176 




Deletion 


SD 


0. 


102 


0. 


117 


0. 


115 


0.132 


0.143 


0J53 


L. 


180 


0. 


090 


0. 


089 


0,094 


0.093 


0. 


095 


0. 


095 


0.110 


0.196 


0.194 


0.240 


0.254 


0-268 


0.534 


0.477 


200 


Mean 


^ — 
MN 


0. 


560 


0.604 


0. 


636 


0.670 


0.691 


0,703 


0. 


718 


0. 


306 


0. 


314 


0,322 


0.328 


0. 


338 


0. 


346 


0.352 


0.220 


0.438 


0.551 


0.611 


0,668 


0-698 


0,726 




Sub 


SO 


0- 


083 


0. 


078 


0- 


071 


0.063 


0,051 


0.049 


0. 


045 


0, 


068 


0. 


068 


0.068 


0.071 


0. 


068 


0. 


069 


0.069 


0.134 


0.124 


0.101 


0.081 


0,074 


0,071 


0,058 


200 


Simple 


MN 


0. 


560 


0. 


558 


0. 


562 


0.563 


0.571 


0,558 


0. 


564 


0, 


306 


0. 


301 


0,295 


0.288 


0. 


280 


0. 


269 


0,246 


0.220 


0.219 


0.225 


0.220 


0-211 


0,190 


0,213 




Reg 


SD 


0. 


083 


0. 


090 


0. 


092 


0.096 


0.111 


0,119 


0, 


139 


0. 


068 


0, 


067 


0.066 


0,070 


0,066 


0. 


066 


0.065 


0.154 


0.143 


0.148 


0.161 


0-161 


0,180 


0,170 


ZOO 


Multiple 


MN 


0. 


560 


0. 


538 


0. 


522 


0.485 


0.457 


0.403 


0. 


355 


0. 


306 


0. 


300 


0.291 


0.282 


0. 


271 


0.25$ 


0.230 


0.220 


0.166 


0.111 


0,047 


-0-061 


'0,189 


-0.500 




Reg 


SD 


0. 


083 


0. 


095 


0, 


103 


0.125 


0,169 


0.194 


0. 


249 


0, 


068 


0, 


067 


0.068 


0.074 


0. 


073 


0. 


077 


0,089 


0.154 


0,147 


0.166 


0,171 


0,190 


0-220 


0-237 


200 


Listwise 


MN 


0, 


560 


0. 


559 


0. 


563 


0.562 


0.574 


0.552 


0, 


564 


0. 


306 


0, 


305 


0.304 


0.307 


0. 


307 


0.309 


0.301 


0.220 


0,218 


0.227 


0,220 


0-209 


0-188 


0.216 




Deletion 


SD 


0, 


083 


0. 


091 


0. 


092 


0,098 


0.117 


0.128 


0, 


151 


0. 


068 


0. 


069 


0.075 


0.080 


0. 


085 


0. 


091 


0,098 


0.134 


0.144 


0.146 


0.172 


0-166 


0-191 


0-178 


200 


Pairwise 


MN 


0. 


560 


0. 


556 


0. 


564 


0.559 


0.561 


0.572 


0. 


574 


0. 


306 


0,307 


0.307 


0.306 


0.509 


0. 


311 


0.308 


0.220 


0,216 


0.214 


0,217 


0-209 


0-197 


0-205 




Deletion 


SD 


0. 


083 


0. 


089 


0. 


091 


0.105 


0.112 


0.120 


0, 


130 


0. 


068 


0.068 


0.068 


0.073 


0. 


069 


0. 


071 


0.075 


0.134 


0.151 


0.175 


0.160 


0-198 


0-226 


0-237 



ox 



10X 20X 30X 



40X 



SOX 



60X 
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Table 6 

Effect Sues of Mean Value of Obtained Under Hissing Data Treatnynts Relative to the Pistribution Under Coroi^te Data Conditions . 



Missing 
Sample Data 
Size Treatment 


- — ■ 

Achievement Data 


. . — ^^w—.^- — . — " — ^ , 

Psychological Trait Data 


Likert Rating Data 


Percent of Data Hissing 


Percent of Data Hissing 


Percent of Data Hissing 


10% 20t 30% 40% 50% 60% 


10% 20% 30% 40% 50% 60% 


10% 20% 30% 40% 50% 60% 


50 Mean Si±> 
50 Simple Reg 
50 Hultiple Reg 
50 Listwise Del 
50 Pafrwfse Del 


-0.030 -0.040 -0.070 -0.100 -0.080 -0.100 
-0.010 O.OOO 0.000 0.050 0.080 0.100 
0.040 O.08O 0.170 0.280 0.560 0.540 
0.010 0.040 0.090 0.040 0.240 0.130 
0.010 0.040 0.090 0.140 0.170 0.180 


-0.149 -0.170 -0.372 -0.479 -0.521 -0.606 
0.064 0.298 0.39^ 0.500 I.OCW 1.362 
0.064 0.309 0.415 0.585 1.032 1.660 
0.043 0.096 0.000 0.149 0.266 0.617 
0.043 0.074 0.021 0.011 0.181 0.309 


-0.194 -0.367 -0.469 -0-520 0.582 -0.592 
-0.010 -0-112 -0.092 -0.031 -0.061 -0.082 
0.102 0.122 0.276 0.408 0-520 0.765 
0.051 0-020 O.OOO 0-082 -0.051 0-092 
0.071 0.071 0.082 0-163 0-224 0-398 


100 Mean Sub 
100 Simple Reg 
100 Hultiple Reg 
100 listwise Del 
100 Pairwise Del 


-0,046 -0.092 -0.154 -0.154 -0.169 -0.200 
-0.046 -0.077 -0.077 -0.062 -0.015 0.154 
0.031 0.062 0.169 0.215 0.400 0,615 
0,046 0.077 0.123 0.200 0.354 0.154 
0.015 O.015 0.062 0.077 0.154 0.185 


-0.176 -0.284 -0-446 -0.689 -0.757 -0.878 
0-149 0.405 0-608 0,676 1.203 1.635 
0.135 0.365 0.568 0.595 1 J49 1-743 
0.000 0.068 0.068 -0.054 0.189 0.270 
0.000 0.041 0-041 -0,108 0.054 0.149 


-0.360 -0-520 -0.640 -0,747 -0.787 -0.827 
-0.053 -0-147 -0.227 -0.253 0.147 -0.360 
0.080 0.227 0.333 0.493 0-787 0.840 
-0.013 0-093 0.000 0.067 0.147 0.093 
-0.013 0.093 0.067 0.107 0-240 0.253 


200 Mean Sub 
ZOO Simple Reg 
200 Hultiple Reg 
200 listwise Del 
200 Pafrwise Del 


-0.093 -0.163 -0.279 -0.326 -0.349 -0.372 
-0.023 -0.093 -0.140 -0.209 -0.233 -0.233 
0.070 0.116 0.256 0.349 0.512 0.721 
0.047 -0.023 0.047 0.140 -0.186 -0.023 
0.023 0.000 0.047 0.047 0.023 0.047 


-0.236 -0.418 -0.600 -0.873 -1.r55 -1.200 
0.182 0.473 0.818 1.218 1-709 2.691 
0.164 0.436 0-764 1.127 1,600 2.527 
-0-036 0.000 0.073 0.036 0.055 0.218 
-0.018 0.000 0.055 0.018 0.036 0.236 


-0.482 -0.732 -0.875 -1-036 -1-071 -1.161 
-0.125 -0.268 -0.339 -0.446 -0.482 -0.500 
0.143 0.268 0.429 0.643 1.054 1.268 
0,018 0-036 0.036 -0-018 0.107 0.054 
0.018 0-054 0.036 0.036 0.179 0.143 



ERIC 



Table 7 

£ffect Sizes of Wean value of Beta for the VarUbie With Hissing Ddta Obtained INider Hf$si'fX| Data Tre^^inents Relative Xo the Pistrfbutfon 
Ufxier Complete Data Conditions , 



Missing 
Sdffple Data 
Size Treatinent 


Achievement Data 


Psychological Trait Data 


Likert Rating Data 


Percent of Data Hissing 


Percent of Data Missing 


Percent of Data Hissing 


10% 20% 30% 40% 50% 60% 


10% 20% 30% 40% 50% 60% 


10% 20% 30% 40% 50% 60% 


50 Hean Sub 
50 Siirple Reg 
SO Multiple Reg 
50 listwise Del 
50 Pairwfse Del 


-0.282 -0.429 -0.601 -0.761 -0.773 -0.840 
0.012 -0.025 -0.067 -0.012 0.006 -0.037 
0.141 0.227 0.374 0.650 0.6S7 0.902 
0.012 -0.006 '0.025 -0.012 -0.043 -0.135 
0.018 0.012 0.061 -0.012 0.135 -0.110 


-0.234 -0.252 -0.595 -0.748 -0.892 -1.081 
0.054 0.324 0.351 0.459 0.901 1,045 
0.063 0.369 0.450 0.667 1.108 1.450 
-0.099 0.063 -0.054 -0.090 0.009 -0.009 
-0.063 0.081 -0.063 -0.036 0.090 0.189 


-0.833 -1.338 -1.638 -1.879 -2.071 -2.100 
-0.017 -0.029 -0.025 -0.054 -0.038 0.125 

0.208 0.438 0.675 1.004 1.467 2.042 
-0.013 -0.008 0.017 -0.071 -0.025 0.129 

0.079 0.121 0.029 0.042 0.017 0.621 


100 Hean Sub 
100 Simple Reg 
100 Multiple Reg 
100 ListMise Del 
100 Psirwise Del 


-0.308 0.632 -0.880 -1.009 -1.068 -1.291 
-0.051 -0.154 -0.145 -0.222 -0.094 -0,034 

0.145 0.265 0.564 0.735 1.248 1.368 
-0.026 -0.094 -0.043 -0.162 -0.060 -0.094 

0.043 -0.017 0.068 0.026 0.239 -0.009 


-0.250 -0.450 -0.738 -1.200 -1.363 -1.625 
0.225 0.525 0-750 0.813 1.288 1.613 
0.213 0.500 0.750 0.800 1.400 1.988 

'0.013 0.025 0.013 -0.275 -0.125 -0.025 
0.000 0.050 0.025 -0.200 -0 ?50 0.O50 


-1.114 -1.663 -2.0^7 -2.347 -2.477 -2.653 
-0.031 0.005 -0.078 -0.026 0.088 -0.171 
0.218 0,658 0.927 1.451 2.073 2.497 
"0.005 0.052 -0.021 0.036 0-145 -0.114 
-0.047 0.202 0.041 0.104 0.547 0.290 


200 Mean Sub 
200 Simple Reg 
200 Hultiple Reg 
200 listwise Del 
200 Pairwise Del 


-0.489 -0.819 -1.245 -1.500 -1.617 -1.830 
-0.021 -0J17 -0.160 -0.330 -0.223 -0.319 
0.245 0.404 0.819 1.085 1.670 2.170 
0.021 -0.043 -0.021 -0.149 -0.011 -0.117 
0.043 -0.053 0.011 -0.021 -0.170 -0.191 


-0.379 -0.655 -1.000 -1.466 -1.8^9 -2.207 
0.276 0.655 1.052 1.552 2.069 3.086 
0.241 0.603 1.034 1.500 2.034 3.069 
•0.034 0.000 0.017 -0.034 -0.052 0.190 
-0.034 0.000 0.034 -0.017 -0.017 U.241 


•1.705 -2.589 -3.070 -3.550 -3.783 -4.031 
•0.039 -0.140 -0.124 -0.101 0.039 -0.155 
0.419 0.845 1.549 2.178 3.202 4.054 
0.016 -0.047 0.000 0.062 0.248 0.031 
0.031 0.054 0.023 0.070 0.194 0.124 
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Table 8 

Effect Sizes of Mean Va*ue of Beta for the Variable without Missing Data Obtained Under :><issing Data Treatments Relative to the Di5tr(bution 
Under Complete Data Conditions . 



Missing 
Sample Data 
Size Treatment 


Achievement Data 


Psychological Trait Data 


Likert Rating Data 


Percent of Data Missing 


Percent of Data Missing 


Percent of Data Missing 


10X 20X 30X 40X SOX 60X 


10X 20X 30X 40X SOX 60X 


10X 20X 30X 40X SOX 60X 


SO Mean Sub 
SO Sinple Reg 
SO Multiple Reg 
SO Listwise Del 
SO Pairwise Del 


0.283 0.440 0.S91 0.736 0.767 0.862 
•0.031 -0.006 0.019 -G.031 -0.075 -0.031 
-0.14S -0.226 -0.365 -0.629 -0.717 -0.9<»3 
-0.019 0.019 0.038 0.006 0.069 0.094 
-0.019 -0.006 -O.OSO 0.019 -0.132 0.088 


0.041 0.096 0.171 0.192 0.247 0.274 
-0.027 -O.OSS -0.089 -0.123 -0.199 -0.267 
-0.041 -0.089 -0.137 -0.226 -0.274 -0.349 

0.068 -0.027 -0.096 0.068 0.000 0.123 
-0.007 0.007 0.041 0,000 0.021 -0.027 


0.819 1.309 1.601 1.823 1.992 2.033 
0.004 -0.012 -0.016 0.021 -0.012 -0.189 

-0.202 -0.436 -0.6S8 -0.984 -1.440 -2.000 
0.021 0.008 -0.029 0.066 -0.041 -0.1S2 

-0.074 -0.119 -0.029 -0.041 -0.037 -0.609 


100 Mean Sub 
100 Sinple Reg 
100 Multiple Reg 
100 t.istwise Del 
100 Pairwise Del 


0.333 0.696 0.941 1.098 1.186 1.363 
0.01U 0.078 0.039 0.098 -0.0^9 -0.078 

-0.176 -0.314 -0.637 -0.B63 -1.412 -1.SS9 
0.039 0.118 0.069 0.206 0.167 0-078 

-0.049 0.010 -0.078 -0.039 -0.24S 0.000 


0.078 0.167 0.2S6 0.3S6 0.411 O.SOO 
-0.067 -0.133 -0.189 -0.222 -0.400 -0.478 
-0.078 -0.178 -0.267 -0.333 -0.S44 -0.722 
-0.011 0.011 0-000 0.0S6 0.0S6 -0.011 
-0.011 -0.011 0-011 0.0S6 0.011 0.033 


1 .087 1.622 1.99S 2.260 2.398 2. SSI 
0-005 -0.061 0-000 -0.071 -0.168 0.031 
-0.209 -0.648 -0-908 -1-423 -2.01S -2.464 
-O.OOS -0.041 0.01S -0-041 -0.138 0.107 
0.046 -0.199 -0.046 -0.107 -0.327 -0-291 


200 Mean Sub 
200 Simple Reg 
200 Multiple Reg 
200 Listwise Del 
200 Pairwise Del 

1 


0.S30 0.916 1.32S 1.S78 1.723 1.904 
'0.024 0.024 0.036 0.133 -0.024 0 048 
-0.?6S -0.4Sa -0.904 -1.241 -1.892 -2.470 
-0.012 0.036 0.024 0.169 -0.096 0.048 
-0.048 0-048 0-012 0.012 0.14S 0.169 


0.118 0.23S 0.324 0.471 0.S88 0.676 
-0.074 -0.162 -0.26S -0.382 -n.S44 -0.882 
-0.088 -0-221 -0.3S3 -0-51S -0.706 -1.118 
0.01S -0.029 0.01S 0.01S 0-044 -0.074 
0.01S 0.01S 0.000 0.000 0*074 0.029 


1.627 2.470 2.918 3-343 3.S67 3.77f 
-0.007 0.037 0.000 -0.067 -0.224 -0.0S2 
-0.403 -0.813 -1.291 -2.097 -3.0S2 -3.881 
-0.01S 0.0S2 0.000 -0.082 -0.239 -0.030 
-0.030 -0-04S -0.022 -0.082 -0.172 -0.112 
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Table 9 

Nuntoer and peroent of oells yielding effect sizes greats than or 



ecaial to 0.3. 



Estiination of R-Squarie 



Missing Hata 
Tneatme.! 



Achievaoent 
Data 



Rsyctiolrgical LdJcert Rating 
Trait Data Data 



Mean Substituticxi 3 (17%) 13 (72%) 

Multiple Regressicxi 7 (39%) 15 (83%) 

Sinple Regression 0(0%) 14 (78%) 

Listwise Deletion 1(6%) 1 ( 6%) 

Pairwise Deletion 0(0%) 1 ( 6%) 



17 ( 94%) 

11 ( 61%) 

5 ( 28%) 

0 ( 0%) 

1 ( 6%) 



Overall 



33 (61%) 

33 (61%) 

19 (35%) 

2 ( 4%) 

2 ( 4%) 



Estimation of the Regression Vtei^t 
for the Varicible With Missing Data 



Missing Data 
Tn^tinent 



AcJiieveanent 
Data 



Psychological Likert Rating 
Trait Data Data 



Overall 



Mean Substitutirai 17 ( S4%) 15 ( 83%) 

Multiple Regression 13 ( 72%) 15 ( 83%) 

Siiiple Regression 2 ( 11%) 15 ( 83%) 

Listwise Deletion 0 ( 0%) 0 ( 0%) 

F^irwise Deletion 0 ( 0%) 0 ( 0%) 



18 (100%) 

16 ( 89%) 

0 ( 0%) 

0 ( 0%) 

2 ( 11%) 



50 (93%) 

44 (81%) 

17 (31%) 

0 ( 0%) 

2 ( 4%) 



Estimation of the Regression Weight 
for the Variable Withait Missing Data 



Missing Data 
Treatment 



Aciiieveanent 
Data 



Psychological likert Rating 
Trait Data Data 



Overall 



Mean Substituticm 17 


( 94%) 


7 


( 39%) 


18 


(100%) 


42 


(78%) 


Multiple Regressicn 14 


( 78%) 


8 


( 44%) 


16 


( 89%) 


38 


(70%) 


Sinple Regression 0 


( 0%) 


5 


( 28%) 


0 


( 0%) 


5 


( 9%) 


Listwise Deletion 0 


( 0%) 


0 


( 0%) 


0 


( 0%) 


0 


( 0%) 


Pairwise Deletion 0 


( 0%) 


0 


( 0%) 


2 


( 11%) 


2 


( 4%) 
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Table 10 

Ration of wi thin-celt Standard Deviation of Under Missing Data Treatments to the Uithin-cctt Standard Deviation Under Complete Data Conditions . 



Hissing 
Sample Data 
Size Treatment 


Achievef?>ent Data 


Psychological Trait Data 


Likert Rating Data 


Percent Mising 


Percent Mising 


Percent Mising 


lot 20% 30% 40% 50% 60% 


10% 20% 30% 40% 50% 60% 


10% 20% 50% 40% 50% 60% 


50 Mean Sii) 
50 Simple Reg 
50 Multiple Re9 
50 Listwise De. 
50 Pairwise Del 


1,010 1,030 1.000 1.040 1.020 1.030 
1 .030 1 .030 1 .040 1 . 160 1 . 190 1 .240 
1.010 1.000 1.000 1.010 1.090 1.230 
1 . 100 1 .040 1 . 150 1 .270 1 .360 1 .500 
1.010 1.000 1,000 0.990 1.010 1.040 


1 . 032 1 . 000 1 . 02 1 1 . 02 1 1 , 096 1 . 000 
1.053 1.096 1.277 1.372 1.702 1.883 
1.053 1.096 1.266 1-372 1.649 1.851 
1.064 1.106 1.202 1.319 1.532 1.660 
1 .043 1 ,043 1 . 138 1 . 138 1 .340 1 .245 


1.031 1.051 1.102 1,071 1.092 1.112 
1.071 1.(^2 1.153 1,265 1.327 1.439 
1.041 1.010 1.061 1.122 1.245 1.204 
1.071 1.071 1.265 1.347 1.612 1.510 
1.051 1.031 1.153 1,153 1.316 1.276 


100 Mean Si^ 
100 Simple Reg 
100 Multiple Reg 

100 Listwise Oel 
100 Pairwise Del 


1.015 1.000 1.000 1.000 1.000 1.031 
1 . 000 1 . 000 1 . 046 1 . 092 1 . 200 1 . 400 
1.000 1,031 1.077 1.077 1.200 1.323 
1,046 1.215 1.231 1,385 1.431 1,662 
1.000 1.031 1.031 1.031 1.077 1.092 


0.959 0.932 0.946 0.932 1 .000 0.892 
1 . 054 1 . 081 1 . 135 1 . 284 1.716 2 . 000 
1.041 1.095 1.149 1.257 1 .703 1.905 
1,027 1.054 1.149 1,284 1.608 1,662 
1 .000 1 .014 1 .027 1 .095 1 .297 1 .257 


1.000 1.027 1.053 1.053 1.067 1.CK0 
1.013 1.107 1.053 1.267 1,413 1,347 
1.000 1.027 1.027 1.080 1.107 1,133 
1,040 1.093 1.240 1.280 1.413 1.400 
1,013 1.067 1.107 1.107 1.307 1.293 


200 Mean Sub 
200 Simple Reg 
200 Multiple Reg 
200 Li$twise Del 
?no Pairwfse Del 


0.977 1 ,000 0,953 0.977 0.953 0.977 
0.977 1,000 1.023 1.000 1.000 1.047 
1.000 1 .047 1.023 1.186 1.279 1.349 
1.047 1.163 1.233 1.372 1.535 1.698 
1.000 1.023 1.000 1.070 1,070 1.047 


1,000 0,982 0.982 0.964 1.000 1.018 
1 . 036 1.127 1.218 1 . 309 1 . 600 1 . 873 
1 .055 1 . 145 1 .236 1 .309 1 .636 1 .945 
1 .036 1 ,091 1 . 182 1 .236 1 .418 1 .491 
1.018 1.055 1.091 1,109 1.273 1.382 


0.982 1.036 1.089 1.107 1.107 1.143 
1.000 1.018 1.054 1.161 1.304 1.482 
0.982 0.964 1.036 1.125 1.232 1.250 
1.000 1.036 1.232 1.268 1.411 1.464 
0.982 1.018 1.143 1.161 1.232 1.304 
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Tabl« 11 

Ratios pf Withln>cell Standard Dcviatton of the Regression Weight for the Variable With Hissing Data Urxkr Hissing Data Treatments to the Withfn-celi 
Standard Deviation Under Complete Data Conditions . 



Hissing 
Sample Data 
Size Treatment 


Achievement Data 


Psychological Trait Data 


Likert Rating Date 


Percent Hi sing 


Percent Mi sing 


Percent Wising 


10% 20% 30% 40% 50% 60% 


10% 20% 30% 40% 50% 60% 


10% 20% 50% 40% 50% 60X 


SO Hean Sub 
50 Sinf)le Reg 
50 Hutttple Reg 
50 Listwise Del 
50 Pairwise Del 


0.951 0.902 0.883 0.761 0.816 0.718 
1.037 1.092 1.294 1.429 1.656 1.767 
1.129 1.270 1.748 2.0O0 2.528 3,411 
1.025 1.074 1.313 1.294 1.558 1.675 
1.037 1.178 1.521 1.650 1.804 1.883 


1.018 0-946 1.009 0.937 1.009 1.072 
1.108 1 .189 1.450 1.523 1.775 2.279 
1 .090 1 . 144 1.378 1 .450 1.622 2.586 
1.072 1.063 1.225 1.243 1.351 1.712 
1.045 1.063 1.254 1.216 1.441 1.712 


0.988 0.721 0.655 0.600 0.515 0.475 
1.092 1.07V 1.221 1.535 1.408 1.700 
1.165 1.204 1.400 1.604 1.879 2.733 
1.083 1.067 1.208 1.538 1.455 1.629 
1.271 1,450 1.546 1.700 2.521 2.996 


100 Hean Sub 
100 Sinple Reg 
100 Multiple Reg 
100 Listwise Del 
100 Pairwise Del 


0.974 0.906 0.812 0.846 0.769 0.786 
1.034 1.043 1.222 1.368 1.598 2.026 
1.154 1.274 1.658 1.889 2.393 3.197 
1.051 1.060 1.256 1.308 1.419 1.761 
1.128 1.154 1.333 1.453 1.573 1.915 


0.913 1.000 1.000 0.975 1,075 1.025 
1.015 1.163 1.325 1.450 1.925 2.565 
1 .025 1 . 163 1 .288 1 .338 1 .775 2.050 
0.975 1.058 1.188 1.200 1.463 1.613 
0.975 1.088 1.150 1.188 1.488 1.638 


0.746 0.642 0.570 0,523 0.415 0.409 
1.021 1.057 1.057 1,171 1.275 1.528 
1.041 1.187 1.275 1.420 1.446 2.269 
1.026 1.067 1.062 1.155 1.238 1.513 
0.990 1.249 1.368 1.435 1.793 2.525 


200 Mean Sub 
200 Simple Reg 
200 Multiple Reg 
200 Listwise Del 
200 Pairwise Del 


0.957 0.947 0.840 0.745 0.755 0.723 
1 . 074 1 . 106 1 . 149 1 .287 1 . 383 1 . 596 
1 . 149 1 .266 1 .500 2.043 2.330 2.947 
1.074 1.117 1.170 1.383 1.500 1.658 
1.074 1.149 1.287 1.426 1.543 1.660 


1 .000 0.966 1 .052 0.914 0.985 1 .034 
1 .052 1 . 121 1 .379 1 .579 1 .655 1 .862 
1.086 1.155 1.362 1.597 1.586 1.810 
1.052 1.086 1.224 1.190 1.562 1.569 
1.052 1.069 1.224 1.190 1.328 1.517 


0.891 0,736 0.597 0.512 0.558 0.405 
1.062 1.085 1.217 1.253 1.434 1.405 
1.109 1.240 1.502 1.457 1.715 1.829 
1 . 0 70 1 . 095 1 . 235 1 . 225 1 . 442 1 . 549 
1.140 1.535 1.279 1,566 1.829 1.922 



Tabl« 12 

Ratlog o f Within-ceH Standard Oeviotfon of the Regression Wright for the Variable Without Missing Data Under Hissing Pats Treatments to the 
Withln-cett Standard Deviation under Cony>iete Data Conditions . 



Hissing 
Sample Data 
Size Treatment 


AchievefBent Data 


Psychological Trait Data 


I 

Likert Rating Data 


Percent Mi sing 


Percent Mi sing 


Percent Mising 


10% 20X 30% 40% 50% 60% 


10% 20% 30% 40% 50% 60% 


10% 20% 3C% 40% 50% 60% 


SO Hean Sub 
50 Simple Re^ 
50 Multiple Reg 
50 Lis^wfse Del 
50 Pairwise Del 


0.918 0.862 0.761 0.660 0.686 0.566 
1.031 1.063 1.233 1.340 1.541 1.616 
1.113 1.220 1.642 K931 2.390 3.258 
1.050 1.038 1.258 1.233 1.547 1.635 
1.013 1.145 1.421 1.535 1.629 1.679 


0.993 1.014 0.979 1.021 1.021 1.048 
0.993 1 .014 0.966 1 .000 0.993 1 .116 
1.000 1.048 1.021 1.144 1.199 1.678 
1.055 1.144 1.219 1.247 1.370 1.582 
0.993 1 .027 0.993 1 .082 1 . 1 16 1 .240 


0.984 0.733 0.617 0,588 0.457 0.449 
1 .07B 1 .062 1 .193 1 .263 1 .305 1 .617 
1.144 1.202 1.374 1.547 1.782 2.679 
1 .070 1 .074 1 . 198 1 .354 1 .424 1 .634 
1.243 1.428 1.473 1.601 2.337 2.905 


100 Mean Sub 
100 Simple Reg 
100 Multiple Reg 
100 Listwise Del 
100 Pairwise Del 


0.951 0.863 0.765 0.706 0.676 0.559 
1.069 1.088 1.255 1.353 1.578 1.892 
1,186 1.275 1.657 1.912 2.431 3.206 
1.059 t/i08 1,314 1.402 1.490 1.824 
1.147 1.127 1.294 1.402 1.500 1.765 


0.989 1 ,033 1.011 1.022 1.OO0 1.067 
0.978 1.022 1.011 1.022 1.011 1.178 
1 . 000 1 . 056 1 .056 1.111 1 . 233 1 .544 
1.044 1.133 1.178 1-356 1.544 1.778 
0 .989 1 . 044 1 . 033 1 . 056 1 . 056 1 . 222 


0.765 0.648 0.526 0.500 0.413 0.398 
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